Engineering the LOUDS Succinct Tree Representation

نویسندگان

  • O'Neil Delpratt
  • Naila Rahman
  • Rajeev Raman
چکیده

Ordinal trees are arbitrary rooted trees where the children of each node are ordered. We consider succinct, or highly space-efficient, representations of (static) ordinal trees with n nodes that use 2n+ o(n) bits of space to represent ordinal trees. There are a number of such representations: each supports a different set of tree operations in O(1) time on the RAM model. In this paper we focus on the practical performance the fundamental Level-Order Unary Degree Sequence (LOUDS) representation [Jacobson, Proc. 30th FOCS, 549–554, 1989]. Due to its conceptual simplicity, LOUDS would appear to be a representation with good practical performance. A tree can also be represented succinctly as a balanced parenthesis sequence [Munro and Raman, SIAM J. Comput. 31 (2001), 762–776; Jacobson, op. cit.; Geary et al. Proc. 15th CPM Symp., LNCS 3109, pp. 159–172, 2004]. In essence, the two representations are complementary, and have only the basic navigational operations in common (parent, first-child, last-child, prev-sibling, next-sibling). Unfortunately, a naive implementation of LOUDS is not competitive with the parenthesis implementation of Geary et al. on the common set of operations. We propose variants of LOUDS, of which one, called LOUDS++, is competitive with the parenthesis representation. A motivation is the succinct representation of large static XML documents, and our tests involve traversing XML documents in various canonical orders.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Succinct Trees in Practice

We implement and compare the major current techniques for representing general trees in succinct form. This is important because a general tree of n nodes is usually represented in pointer form, requiring O(n log n) bits, whereas the succinct representations we study require just 2n + o(n) bits and carry out many sophisticated operations in constant time. Yet, there is no exhaustive study in th...

متن کامل

A Succinct N-gram Language Model

Efficient processing of tera-scale text data is an important research topic. This paper proposes lossless compression of N gram language models based on LOUDS, a succinct data structure. LOUDS succinctly represents a trie with M nodes as a 2M + 1 bit string. We compress it further for the N -gram language model structure. We also use ‘variable length coding’ and ‘block-wise compression’ to comp...

متن کامل

Compression of double array structures for fixed length keywords

Trie is one of the data structures for keyword matching. The trie is used in natural language processing, IP address routing, and so on. It is represented by the matrix form, the link form, the double array, and LOUDS. The double array combines retrieval speed of the matrix form with compactness of the list form. LOUDS is a succinct data structure using bit-string. Retrieval speed of LOUDS is n...

متن کامل

Succinct representation of labeled trees

We give a representation for labeled ordered trees that supports labeled queries such as finding the i-th ancestor of a node with a given label. Our representation is succinct, namely the redundancy is small-o of the optimal space for storing the tree. This improves the representation of He et al. [ISAAC 2012] which is succinct only when the entropy of the labels is ω(1).

متن کامل

Ultra-succinct representation of ordered trees with applications

a r t i c l e i n f o a b s t r a c t There exist two well-known succinct representations of ordered trees: BP (balanced parenthesis) (Munro and Raman, 2001) [20] and DFUDS (depth first unary degree sequence) (Benoit et al., 2005) [1]. Both have size 2n + o(n) bits for n-node trees, which asymptotically matches the information-theoretic lower bound. Importantly, many fundamental operations on t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006